Skip to content

fix: prevent apostrophes in docstrings from breaking parsing#695

Merged
boyter merged 1 commit intoboyter:masterfrom
lawrence3699:fix/docstring-apostrophe-parsing
Apr 13, 2026
Merged

fix: prevent apostrophes in docstrings from breaking parsing#695
boyter merged 1 commit intoboyter:masterfrom
lawrence3699:fix/docstring-apostrophe-parsing

Conversation

@lawrence3699
Copy link
Copy Markdown
Contributor

Fixes #246

Problem: docStringState used stringTrie.Match() to detect the end of a docstring. The trie matches any string delimiter, so an apostrophe ' inside a """ docstring was incorrectly matched as a string boundary, causing the state machine to exit the docstring prematurely.

Before:

$ scc test.py   # file contains """ docstring with apostrophe
Python  1  18  1  6  11  0

Lines inside the docstring were counted as code.

After:

$ scc test.py
Python  1  18  2  13  3  1

Docstring lines are correctly counted as comments.

Fix: Replace stringTrie.Match() with checkForMatchSingle(), which checks specifically for the endString that started the current docstring (e.g. """), rather than matching any string delimiter.

Validation: All existing tests pass (go test ./...). Updated TestCountStatsIssue123 expectations (which previously encoded the buggy behavior) and added TestCountStatsIssue246 regression test.

I explicitly license this contribution under the MIT licence.

In docStringState, the string trie was used to detect the end of a
docstring. Since the trie matches any string delimiter (including '
and "), an apostrophe inside a """ docstring would be incorrectly
matched, causing the state machine to exit the docstring prematurely.

Replace the trie match with checkForMatchSingle, which checks
specifically for the endString that started the current docstring.

Fixes boyter#246
Copilot AI review requested due to automatic review settings April 12, 2026 09:20
@pr-insights pr-insights Bot added M/complexity Normal or medium complexity S/size Small change labels Apr 12, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Python docstring parsing so that apostrophes (') inside triple-quoted docstrings (""" / ''') no longer cause the parser to prematurely exit docstring state, which previously led to miscounting docstring lines as code.

Changes:

  • Update docStringState to detect docstring termination by matching only the active docstring’s endString delimiter rather than matching any string delimiter.
  • Adjust the docstring regression test that previously captured the buggy behavior (TestCountStatsIssue123).
  • Add a new regression test for issue #246 to ensure apostrophes inside docstrings don’t break parsing.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
processor/workers.go Fixes docstring end detection by matching only the initiating delimiter (endString) instead of any string delimiter.
processor/workers_regression_test.go Updates prior regression expectations and adds a new test to prevent reintroducing the apostrophe/docstring bug.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@boyter
Copy link
Copy Markdown
Owner

boyter commented Apr 13, 2026

Neat! Thank you! This is one of those issues I have been meaning to fix for a while.

@boyter boyter merged commit f19d8ed into boyter:master Apr 13, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

M/complexity Normal or medium complexity S/size Small change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python docstrings mishandle apostrophes

3 participants